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^^ 1 Introduction 

W Given independent random samples of data, the difference in sample means 

^ is a common measure of disparity between two populations. When the sam- 

ple sizes are large and the samples' respective distributions are reasonably 
well-behaved, a Normal distribution approximation to the mean difference is 
commonly employed. This approximation is justified by the Central Limit 
Theorem. However, in practice it is often difficult to determine the required 
>■ sample sizes needed to ensure a reliable inference. Data arising from a highly 

dispersed Negative Binomial model may be extremely skewed. In many cases, 

OO a one-sample Normal approximation for a Negative Binomial mean does not 

provide reliable estimates, even at sample sizes typically considered sufficiently 

^ large (e.g. n = 50 or 100). 
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^ Shilane et al. (2010) investigated alternative methods for one-sample inference 

^ in highly dispersed Negative Binomial models. These methods include a boot- 

s' strap approach, tail probability bounds such as Bernstein's Inequality, and 

^ parametric methods based upon the Normal, Gamma, and Chi Square distri- 

butions. We seek to extend this analysis to the two-sample case. When each 
sample mean would best be approximated with either a Normal, Gamma, 
or Chi Square distribution, we will demonstrate that a Normal approxima- 
tion is appropriate for two-sample inferences. We will also adapt Bernstein's 
Inequality to generate inferences in two-sample cases for which the Normal 
approximation and Bootstrap methods are unreliable. 



2 One— Sample Inference in Negative Binomial 
Models 

2.1 The Negative Binomial Distribution 

A Negative Binomial variable X typically models the random number of fail- 
ures k G Z"*" observed before the rth success (with r G Z+) over a series of 
trials. Each trial is the result of an independent, identically distributed (i.i.d.) 
Bernoulli random variable that results in success with probability p and fail- 
ure otherwise. The Negative Binomial distribution may be characterized in 
terms of the parameters r and p. An alternative parameterization sets a mean 
parameter /i = r ( - — 1 j and a dispersion parameter 9 = r. We will adopt 

this parameterization for the remainder of this study. The probability mass 
function (Hilbe, 2007) of the Negative Binomial NB(/i, 6') random variable X 



is then given by 



P{X = k) 
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The Negative Binomial distribution serves as a general model for i.i.d. count- 
ing data Xi, . . . ,X„ with n G Z+. The Poisson distribution, which is often 
used to model counts, corresponds to the special case of ^ — t- oo. Under a 
Poisson model, the mean and variance are equal. For any finite value of 6', 
the variance of a Negative Binomial is greater than the mean. This disper- 
sion grows as 9 decreases. When 9 is small, the distribution becomes highly 



skewed. Shilane et al. (2010) demonstrate that the sample mean of i.i.d. Neg- 



ative Binomial random variables exhibits a slow convergence to the Normal 
distribution. As such, a Central Limit Theorem approximation may perform 
poorly at moderate samples sizes (e.g. n = 50 or 100). 



2.2 Inference 

Because the Normal approximation cannot ensure reliable estimates of the 



mean /i, Shilane et al. (2010) proposed a variety of methods for one-sample 



inference in highly skewed Negative Binomial models. These include approxi- 
mations based upon the Gamma and Chi Square distributions, the Bootstrap 



Bias-Corrected and Accelerated (BCA) method (Efron and Tibshirani, 1994), 
and tail probability bounds such as Bernstein's Inequality. The proposed meth- 
ods are largely complementary. Table [T] provides guidelines for selecting an 
appropriate method according to the scenario. The exact boundary at which 



Scenario 


Preferred Method 


Small n, small 9 


Bounded Bernstein 


Large n, small 9 


Gamma 


Small n, large 9 


Normal, Bootstrap, or Bounded Bernstein 


Large n, large 9 


Normal or Bootstrap 


fi^2n9 


Chi Square 



Table 1: General guidelines for selecting among the proposed methods for 
one-sample inference in Negative Binomial models. 



one method overtakes another depends upon the sample size n along with the 
parameters fi and 9. 



3 Methods 

We seek to provide adequate methods for two-sample inference in highly dis- 
persed Negative Binomial Models. The data consist of X = (Xi, . . . ,X„^), 
which are i.i.d. NBdi^, 9x)i and Y = (Yi, . . . , Yny), which are i.i.d. NB{fj,y, 9y). 
The two samples X and Y are independent. In this setting, the difference in 
means ^^ ~ f^y is our parameter of interest. We will estimate this parameter 
with X — Y, the difference in sample means. The methods of inference will 
consist of estimating the distribution of X — F or providing appropriate prob- 
ability tail bounds using Bernstein's Inequality. 

Inferences about X — Y may be obtained by the Bootstrap method. Oth- 
erwise, since inferences about X and Y may be independently approximated 
by tail probability bounds or any of the Gamma, Chi Square, and Normal 
distributions, the difference X — Y may be categorized by 16 cases. When a 
bound like Bernstein's Inequality is required for either sample individually, it 
will also be applied to the two-sample case. When both sample means are 
approximately Normal, the standard two-sample Normal approximation may 
be applied. Since the Chi Square distribution is a special case of the Gamma, 
the remaining cases only require ascertaining the distribution of the difference 
of two Gammas or that of one Gamma and one Normal. The following subsec- 
tions will adapt Bernstein's Inequality to the two-sample case and show that 
any difference of Gamma and Normal variables is approximately Normal. 



3.1 Bernstein's Inequality 

When at least one of the sample sizes n^ or Uy is sufficiently small, the distri- 
bution of the respective sample mean X or F is not well-approximated by a 
Gamma or Normal distribution. The Chi Square model applies if yU ^ 2n9. In 
all other cases, we must rely upon probability tail bounds to perform inference 
on X — F. Shilane et al. (2010) recommend a bounded variant of Bernstein's 



Inequality for the one-sample setting. We will briefly review the one-sample 
Bernstein method and then introduce an extension for the two-sample setting. 



Let Z = 

{a,b) e 



{Zi, . . . , Zn) be independent random variables bounded in a range 



Shilane et al. , 2010) states that 



a < b. Bernstein's Inequality (Rosenblum and van der Laan, 2008 



J2{Z, ~ E[Z.,]) 
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> e < 2 exp 
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a^ + e{b-a)/3 



(2) 



When the right side of Equation ^ is set equal to a/2, we can construct a 
1 — a confidence interval for E[Z]. Such an interval will have the form Z ±e, 
where e is given by 



'{b-a) log(a/2) ± ^1(6 - a)2[log(a/2)]2 - 8na^log{a/2) 

2n ' 



(3) 



The two-sample case can be adapted to the form of the one-sample version of 
Bernstein's Inequality. Consider the following transformation of the data: Let 
n = rix + Uy, and define Zi, . . . , Zn as 



-F- 



(4) 



The data set Z is constructed so that Z 

„2 



^Xi if iG {l,...,n^.}; 

if i G {ux + 1, . . . ,n}. 

X — Y . Therefore, E[Z] = /U^. — //j^ 

— 2 2 

and Var(Z) = ^ -\ — ^. Since Zi, . . . , Z„ are independent, bounded variables, 
the version of Bernstein's Inequality given by Equation ^ may be applied. 
The bounding range (a, b) may be specified in terms of the maximum values 
of the two data sets. Once the sample size n, variance a^, and bounding range 
(a, b) are specified, Bernstein's Inequality may be applied. These parameters 
are: 
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a 



'^x ~r y^ 
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Var{Z) 
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'-^x + 
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max (Fi,. .. ,F„J with 



Cfe- 



n. 



max (Xi, . . . , X„^) with Ct 



■- 1 by default; 
1 by default. 



(5) 



Applying these parameters to Equation (pi), a 1 — a confidence interval for 
fJ'x — f^y is given hj X — Y ± e. Furthermore, a test of the null hypothesis 
Hq : fi^ — fj,y = w versus the two-sided alternative Ha : ^ix — l^y i^ w can also 
be performed using Bernstein's Inequality. In this case, the value of e is given 
hy X — Y — w . Then the p- value for this test is the value of a solving Equation 
(|3|, which requires an application of the Quadratic Formula: 
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2exp 



*exp 
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(|ne(6 - a) - 1(6 - a)2 + 8na^) - fn^e^{b - a^ 
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One caveat to the proposed use of Bernstein's Inequality is that Negative Bino- 
mial variables are in fact unbounded above. Any selected bounding range (a, b) 



will be at best a heuristic assumption. Shilane et al. (2010) considered both 



bounded (Rosenblum and van der Laan 2008) and unbounded (Birge and Mas 



sart 1998 ) variants of Bernstein's Inequality. The Bounded Bernstein method 



for one-sample inference proved to be a useful tool at small sample sizes in 
simulation studies. However, the Unbounded Bernstein method was not able 
to generate inferences of a reasonable quality because its tail probability bound 
was not sufficiently sharp. There are limited guidelines for selecting (a, b). At 
minimum, the respective samples' maximum values could be selected; that is, 
the constants c^ and Cf, should be at least one. 

A variety of other tail probability bounds may be employed in place of Bern- 



stein's Inequality. These include other varieties of Bernstein's Inequality (Bern 



stem 



1934), Bennett's Inequality (Bennett, 1962 1963), Hoefding's Method 



(Hoeffding 1963), McDiarmid's Inequality (Kutin, 2002; McDiarmid 1989), 



and the Berry- Esseen Inequality (Berry 1941 Esseen, 1942, 1956; van Beek 
T972|. 



3.2 Parametric Approaches 

When both samples' respective means can be modeled with either a Chi 
Square, Gamma, or Normal distribution, the difference is sample means will be 
approximately Normal. We can establish this by considering the Laplace trans- 
form of each possible pair of distributions. As an example, suppose X is ap- 
proximately Gamma! n^-^a;, ^^^^^ j and Y is approximately Gamma! nj^6'j,, ^^ 
Then the Laplace transform of X — F is: 



/ II W ^"^^^ / II \\ ^^y^v 

.,_,(A,^.,(A)M-A)^(l-£-) (iH-^) . (T) 

The natural logarithm of this transform is then: 

log {L^.yW) = -n.e.log (l - ^) - UyOy log (l + ^) . (8) 



Using the first and second-order Taylor series approximation log(l + v) 
V — ^, Equation (8) is approximately: 



,2\2 



= (^, - ^„)A + f-^ + AH. (9) 



'^x^x y y J 



Exponentiating both sides of Equation ([9]) shows that the Laplace transform 
of X — F has an approximately Normal distribution with mean yU^. — [ly and 

variance ^ + ^. That is, X - F f« iV f u^ - u„, ^ + ^ 



Similar arguments may be applied when one of X or F is approximately Nor- 
mal and the other is Gamma or Chi Square. The difference X — F will be 
approximately Normal for all 9 parametric combinations. The parameters of 
these Normal distributions are given in Table [2j In all other circumstances, 
inference may be obtained using the Bootstrap method or an appropriate tail 
probability bound such as Bernstein's Inequahty. 

In all cases, the mean difference fix—f^y is estimated by the statistic X— F. The 
variance of the sample mean difference depends upon the mean and dispersion 
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Table 2: The distribution of X 
Y. 



Y. The rows represent X and the columns 



parameters of the two samples. Two estimation methods may be considered. 
One approach would consist of first estimating the dispersion parameters 6x 
and 6y and then plugging these estimates into the appropriate scenario in Ta- 
ble [2] The second approach is to directly estimate the sample variances with 
the statistics s^ and Sy. This approach directly estimates the variance pa- 
rameter without relying upon estimates of the nuisance parameters 6x and 6y. 
Therefore, the estimated variance of X — F is the familiar s'^/rix + si/uy. 

We recommend the latter approach of directly estimating s^. and s^, especially 
in light of the difficulty of estimating small values of 6^ and 6y. These disper- 



sion parameters can be estimated through the method of moments (Pieters 



et al. 1977 Shilane et al. , 2010) or numeric maximum likelihood estimation 



(MLE) procedures (Clark and Perry 1989 Piegorsch 1990). (Pieters et al. 



1977) provides a comparison of these procedures. However, the MLE does not 



necessarily exist (Aragon et al. , 1992 Ferreri, 1997). In practice, MLE esti- 



mates are either highly variable or generate computational errors in software 
implementations when the dispersion is very small. Meanwhile, the method 
of moments estimator results in negative estimates of the strictly positive dis- 
persion when the data's sample variance is less than the sample mean. Even 
if these difficulties were resolved, direct estimation is typically more efficient 
than plug-in estimators. With these considerations in mind, we will rely upon 
direct estimates of the sample variances and avoid unnecessary estimation of 
the dispersion parameters. 



3.3 A Mixture Method 

In general, we expect the Bernstein method to produce more conservative and 
considerably wider confidence intervals than the Normal approximation. As 
such, these techniques may be used in a complementary fashion. When the 
sample sizes are small and the dispersion is high, Bernstein confidence intervals 



will be more reliable. At larger sample sizes and more moderate dispersions, 
the Normal approximation should be sufficient. We also propose a Mixture 
method that averages the lower (L) and upper (f/) end-points of the intervals. 
Such a method may produce improvements in boundary settings in which the 
Normal approximation is gaining in reliability but still insufficient for inference. 
Other weighted combinations may be considered of the form 



(-t^Mi 



u^ 
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f/N 



Normal 7 '-^Normal 



+ {l~w){L 



U^ 



Bernstein; ^-^Bernstein 



;t/7G[0,l]. 



(10) 



We will set w = 0.5 as a default, which corresponds to the case of averaging 
the Normal and Bernstein intervals. 



4 Simulation Studies 



Parameter 


Values 


fJ'X 


{5,10} 


fiy 


{5,10} 


e. 


{0.01,0.025,0.05,0.075,0.1} 


Oy 


{0.01,0.025,0.05,0.075,0.1} 


nx 


{10, 20, 30, ... , 180, 190, 200, 250, 500, 1000} 


Hy 


{10, 20, 30, ... , 180, 190, 200, 250, 500, 1000} 


Trials 


10000 



Table 3: Parameter values for the simulation experiments of Section |4j Each 
choice of sample sizes n^ and Uy, means fix and fj,y, and dispersions 9x and 9y 
comprised an independent simulation experiment. A total of 10000 confidence 
intervals were randomly generated for each experiment. Coverage probabilities 
were estimated by the empirical proportion of confidence intervals containing 
the true mean difference ^^ — l^y 



We assessed the quality of the proposed Normal, Bernstein, and Mixture confi- 
dence intervals in a simulation study. We selected a wide array of two-sample 
inference problems in highly dispersed Negative Binomial models. The pa- 
rameter values for this simulation, which are summarized in Table |3} include 
a variety of sample sizes from small to large at dispersions ranging from large 
to extremely high over several combinations of means. Each choice of sample 



sizes rix and Uy, means ^^ and /i^, and dispersions 9^ and 9y comprised an 
independent simulation experiment. Each experiment randomly generated a 
total of 10000 pairs of data sets including n^ i.i.d. NB(/i-c,^a;) and ny i.i.d. 
NB(/iy,^y) random variables. With a = 0.05, 95% confidence intervals for the 
mean difference fi^ — fJ-y were constructed on each of the 10000 pairs of data 
sets according to the Normal, Bernstein, and Mixture methods of the previous 
section. The method's coverage probability in an experiment was estimated 
by the empirical proportion of the 10000 confidence intervals that contained 
the true mean difference ^x — /^y The standard error for this estimate is given 

by y xoooo ' where Pc is the true coverage probability. When pc = 0.95, the 
10000 repetitions ensure that the estimated coverage has a margin of error of 
approximately 0.004 = 0.4%. Under the most extreme case of pc = 0.5, this 
margin of error would be approximately 0.01 = 1%. 

This coverage probability estimation procedure was repeated across the 52900 
simulation experiments defined by all unique combinations of parameters val- 
ues among those listed in Table |3] The Bootstrap method was not employed in 
this simulation because of its heavy computational burden. Each experiment 
entailed the generation of 10000(n^ + ny) random variables. Over the 52900 
experiments, this amounted to a total of approximately 7.84 • 10^^ random 
numbers. All told, the simulation required approximately two days of con- 
tinuous computation to ascertain the quality of the Bernstein, Normal, and 
Mixture methods. If the Bootstrap method were included, this would roughly 
increase the total random numbers to be generated in any experiment by a 
factor of B{nx + riy). If B were set to 10000 or more to ensure reliable Boot- 
strap inferences, this simulation would be considered intractable. 





Min 


1st Qu. 


Median 


Mean 


3rd Qu. 


Max 


Bernstein 


3.58 


20.86 


28.13 


29.24 


35.88 


69.24 


Mixture 


2.87 


16.41 


22.19 


22.90 


28.03 


53.30 


Normal 


2.16 


11.92 


16.13 


16.55 


20.18 


37.60 


Bernstein - Normal 


1.42 


8.85 


11.96 


12.69 


15.71 


33.51 



Table 4: Summary of median length of each method's confidence interval across 
all simulation experiments. The Bernstein - Normal row provides summary 
information for the length difference of the two intervals. 



As an example. Figures III [2l and |3] provide coverage probabihties for all com- 



Normal method coverage probability 
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^x =5 

^y =5 

ex= 0.025 
Qy= 0.025 



• coverage probability < 0.50 

0.50 <= coverage probability < 0.75 
+ 0.75 <= coverage probability < 0.85 
A 0.85 <= coverage probability < 0.90 
^ 0.90 <= coverage probability < 0.95 
■ coverage probability >= 0.95 
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Figure 1: Simulation results for the Normal approximation with fi^ = fiy = 5 
and 9x = Oy = 0.025 across all considered sample size combinations. These 
results may be directly compared to those of the Bernstein method in Figure 
[2] or the Mixture Method in Figure [3j 



Bernstein method coverage probability 
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^x =5 

^y =5 

ex= 0.025 
Qy= 0.025 



• coverage probability < 0.50 

0.50 <= coverage probability < 0.75 
+ 0.75 <= coverage probability < 0.85 
A 0.85 <= coverage probability < 0.90 
^ 0.90 <= coverage probability < 0.95 
■ coverage probability >= 0.95 
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Figure 2: Simulation results for the Bernstein method with fix = f^y = ^ and 
6'x = ^y = 0.025 across all considered sample size combinations. These results 
may be directly compared to those of the Normal approximation in Figure [l] 
or the Mixture method in Figure |3| 



Mixture method coverage probability 
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■ coverage probability >= 0.95 
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Figure 3: Simulation results for the Mixture method with fix = f^y = ^ and 
^x = ^y = 0.025 across all considered sample size combinations. These results 
may be directly compared to those of the Normal approximation in Figure [l] 
or the Bernstein method in Figure |2} 



Normal method coverage probability 
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• coverage probability < 0.50 

0.50 <= coverage probability < 0.75 
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A 0.85 <= coverage probability < 0.90 
^ 0.90 <= coverage probability < 0.95 
■ coverage probability >= 0.95 
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Figure 4: Simulation results for the Normal approximation with fi^ = 5,fJ'y = 
10, 6x = 0.05, and 6y = 0.025 across all considered sample size combinations. 
These results may be directly compared to those of the Bernstein method in 
Figure |5] or the Mixture Method in Figure [6j 



Bernstein method coverage probability 
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Figure 5: Simulation results for the Bernstein method with fi^ = 5,Hy = 
10, 6x = 0.05, and 6y = 0.025 across all considered sample size combinations. 
These results may be directly compared to those of the Normal approximation 
in Figure |4] or the Mixture method in Figure [6j 



Mixture method coverage probability 
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Figure 6: Simulation results for the Mixture method with fi^ = 5, fiy = 10, 6^ = 
0.05, and 6y = 0.025 across all considered sample size combinations. These 
results may be directly compared to those of the Normal approximation in 
Figure |4] or the Bernstein method in Figure [5] 



binations of sample sizes in the case of yUj^ = ;Uy = 5 and 9^ = 9y = 0.025. The 
Normal results in Figure [l] exhibit accurate coverage over a large portion of 
the sample sizes considered. The Normal approximation performs best in a 
region surrounding the main diagonal, and its coverage only drops off as the 
disparity between the two sample sizes grows. In cases of differing dispersions, 
this axis of symmetry shifts. Figure |4] displays the Normal simulation results 
for the case of ^^ = 5,/iy = 10, 6'^. = 0.05, and 6y = 0.025. Notice in this plot 
that sample sizes Ux = 80 and ny = 50 cannot ensure a coverage probability 
of even 0.75 although the X sample draws from the more moderate dispersion 
of ^ = 0.05. 

The two-sample Normal approximation appears to be considerably more ro- 
bust than its one-sample counterparts. Consider a one-sample case of /x = 
5,9 = 0.025, and n = 100 versus the two-sample case oi fix = fJ-y = 5 and 
9x = 9y = 0.025 with Ux = Uy = 50. In either case, 100 i.i.d. data points 



are collected from the same experiment. Shilane et al. (2010) showed that 
applying a Normal approximation to the one-sample case to estimate /i re- 
sulted in a coverage of 0.7802. Meanwhile, the two-sample Normal confidence 
interval covered the mean fix — f^y with probability 0.9822. (In this case, the 
one-sample data's mean is approximately Chi Square because fi = 2n9. The 
Chi Square method covers /i at a rate of 0.9414.) Even though the Normal 
approximation does not provide a good estimate to the one-sample data, its 
performance improves considerably in the two-sample case. This trend is gen- 
erally true across the entirety of the two-sample simulation experiments. It 
appears that the two-sample Normal approximation is considerably more ro- 
bust than its corresponding one-sample method. 

When the Normal approximation fails to provide a strong coverage, the Bern- 
stein method may be used as an alternative. Figure |5] shows a broad range of 
values at which Bernstein confidence intervals improve upon the performance 
of the Normal approximation displayed in Figure |4j Furthermore, the Mixture 
Method that averages the Normal and Bernstein intervals shows that averag- 
ing the two methods results in confidence intervals that extend the range of 
values before the Bernstein method considerably over-covers the mean. 

Table |4] provides summary information for the median length of each method's 
confidence interval across all simulation experiments. It also contains a sum- 
mary of the difference in length between the Bernstein and Normal methods. 
As expected, the Bernstein confidence intervals are considerably wider than 
the corresponding Normal intervals. In many cases the Bernstein intervals 



are roughly double the length of the corresponding Normal interval. When 
the Normal approximation performs poorly in terms of coverage, the wider 
Bernstein interval often provides an inference of higher quality. When the 
Normal method performs well, the Bernstein confidence intervals will signifi- 
cantly over-cover the mean. We can define the preference boundary as the set 
of parameter values at which the Normal approximation overtakes the Bern- 
stein method in terms of its coverage quality. (For instance, this could be 
the point at which the Normal's coverage becomes closer to 1 — a.) In plots 
such as Figure [4l with fi^, fJ^y, Ox, and 9y fixed, this boundary roughly takes 
the form of an ellipse defined on the sample sizes. Within a neighborhood of 
this boundary, the Mixture method will outperform both the Bernstein and 
Normal methods. 



5 Discussion 

At small values of 6, Negative Binomial models produced highly skewed data 
that cause difficulties in drawing appropriate inferences about the mean. The 
Normal approximation often performs poorly in one-sample settings. However, 
Normal inferences on the two-sample mean difference fix — f^y are considerably 
more robust and can perform well even when neither individual sample is ap- 
proximately Normal. Tail probability bounds such as Bernstein's Inequality, 
along with the Normal-Bernstein Mixture method, provide complementary 
procedures. Even under extreme dispersion at small sample sizes, the Bern- 
stein method often performs well. Although it is a conservative bound, Bern- 
stein's Inequality emphasizes that the Normal approximation's confidence in- 
tervals are too narrow. The Mixture method is intended to provide confidence 
intervals of intermediate length. Indeed, an appropriately weighted combina- 
tion of the Normal and Bernstein intervals can be constructed to produce a 
length anywhere in between the component results. 

When at least one of the two samples follows a Gamma or Chi Square dis- 
tribution, the Normal approximation was justified by a second-order Taylor 
series expansion of the cumulant function (the natural logarithm of the Laplace 
transform). Future work could focus on further expanding this Taylor series. 
We could examine the impact of third-order terms on the coverage of the 
Normal approximation and examine the drop-off in accuracy as a decreases. 
Such an analysis would better justify inferences that run deeper into the tails 
of the distribution oi X — Y where the Normal approximation may become 
less accurate. 



We further emphasize that the a-priori selection of the sample sizes rtx and 
Hy for controlled experiments is a difficult problem. This is especially true in 
highly dispersed Negative Binomial models. The simulation results suggest 
that equal dispersions imply that roughly equal sample sizes are preferable. In 
other cases, some degree of imbalance would be preferred. In selecting among 
the Normal, Bernstein, and Mixture methods, we offer the following limited 
guidelines: The Bernstein method is typically preferred when the disparity in 
the sample sizes is large, especially for high dispersions. In more moderate 
cases, the Normal approximation is the generally preferred method. Finally, 
the Mixture method allows for the possibility of improvements over the Bern- 
stein and Normal along the preference boundary. 

The Bootstrap method was not included in the simulation study of Section |4] 
due to its burdensome computational requirements. In the previous work of 



Shilane et al. (2010), the Bootstrap BCA method was shown to produce similar 
results to the Normal approximation in one-sample settings. Because the Nor- 
mal approximation performs well in a greater variety of two-sample settings, 
the quality of two-sample Bootstrap inferences could be further investigated. 
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